library(ggplot2)
library(gridExtra)
library(ggrepel)
library(openxlsx)
This analysis was carried out to address the following criticisms:
The analysis in Cardona and Rudel was based on normalizing by plate-specific controls, rather than by the control mean over all plates. This is because Burgoon believes control means do not track the overall plate effect: essentially, that there is a plate X treatment interaction whose magnitude is so large as to render the plate-specific controls useless for normalizing out plate effects.
A greater fraction than he would suspect of chemicals identified by Cardona and Rudel as potential e2 agonists were run on plates with control levels in the lower range of the distribution of control levels.
The existence of substantial block-to-block variation in control levels totally invalidates the data, and makes the assay useless.
In response, I find that:
Twenty-six e2 chemicals and 24 p4 chemicals in this dataset were run on two plates, and two additional chemicals were run on 3. This replication gives us some degrees of freedom to ask the extent to which normalizing by plate-specific controls reduces the overall mean squared error of values around their treatment-concentration-specific means. For all but three e2 chemicals and 2 p4 chemicals, normalizing by plate-specific controls substantially reduced this mse, and the degree of mse reduction is related to the standard deviation among replicate plate DMSO values. The three e2 chemicals were the three strongest e2 agonists by far, but the two p4 chemicals were not particularly strong p4 agonists.
In the ToxCast assays, there is no real sense in which chemicals were assigned to different blocks of the assays in any sort of random order with respect to expected toxicity. Control means vary by block (see point 3), making it more likely than perhaps you would expect that more effective estrogen agonists would be assigned to plates with lower control values.
Since normalizing by plate-specific controls substantiall adjusts for plate effects, this criticism seems incorrect. Indeed, in the wider field of experimental science, it is widely expected that baseline conditions can change over replicates of an experiment, hence the general practice of including concurrent controls, and using those controls as points of comparison for treatment effects.
In addition, Burgoon based his analysis on untransformed data. This led him, in some of his simulations of data, to use truncated normal distributions to avoid negative values. The resulting simulated distributions will not have the target means and standard deviations. The analyses on which Cordona and Rudel based their analysis were based on log-transformed data (hence, all the analysis in this report were also based on log-transformed data).
e2chem <- read.delim("/home/woodrow/Projects/SilentSpring/Cardona_and_Rudel_Comments/MyAnalysis/active_withsubs_e2up_2031_594.csv")
e2dmso <- read.delim("/home/woodrow/Projects/SilentSpring/Cardona_and_Rudel_Comments/MyAnalysis/DMSO_e2_2031_594.csv")
Do I have DMSO controls for every plate in e2chem?
e2plates <- unique(e2chem$apid)
e2dmsoplates <- unique(e2dmso$apid)
if(!all(e2plates %in% e2dmsoplates)) sort(setdiff(e2plates, e2dmsoplates)) else "All There!"
## [1] "All There!"
Yes! Burgoon’s data were missing a bunch, but this dataset seems to be complete.
Burgoon et al. conducted their analysis on the raw assay values, but the analysis of Cardona and Rudel (CR) relied on the analyses of Haggard et al., which worked with natural-log-transformed values. This can be justified in a couple of different ways. First of all, Burgoon et al.’s estimate of mean and standard deviation for DMSO estrogen values give a large coefficient of variation (.38 / .44 = .86). Since the assay values must be \(\ge\) 0, when Burgoon was simulating values, he had to truncate the normal distribution he was using. His means and standard deviations were not adjusted for that, so the moments for his simulated data did not match what he wanted to simulate. Non-negative random variables become more skewed as their coefficient of variation increases. This is one reason to model with probability distributions defined on the non-negative real numbers, like the lognormal.
Lognormal can also be justified with a heuristic argument that the processes generating the randomness in the data are dominated by multiplicative processes. This is just the situation where you expect a lognormal distribution.
So, here we create a new variable in each dataframe, lrval, the log-transform of rval
e2chem$lrval <- log(e2chem$rval)
e2dmso$lrval <- log(e2dmso$rval)
It is not clear how this dataset relates to the chemicals identified by CR. Here I read the e2 related sheet of CR’s supplemental materials, and compare the list of chemicals to that in e2chem. Filter to just include rows with Efficacy/potency in {“higher”, “intermediate”, “lower”}
CRe2chems <- openxlsx::read.xlsx("/home/woodrow/Projects/SilentSpring/Cardona_and_Rudel_Comments/Supplemental Excel File.xlsx", sheet=4, startRow=3)
CRe2chems <- CRe2chems[CRe2chems[,10] %in% c("higher", "intermediate", "lower"),]
setdiff(CRe2chems[,4], paste0("'",e2chem$casn))
## [1] "'NOCAS_47374"
Burgoon et al. note that we cannot assume that the “plate level effects seen in DMSO will be the same as the plate-level effects we see with the chemical of interest”, because there are generally no replicate plates for a chemical. Are there chemicals with replicate plates? If so, we could make a stab at this problem by looking at them. If control hormone levels track plate effects, then subtracting off the plate controls from responses to treatments should reduce the among-plate variation in treatment-specific response. Clearly, we need at least two replicate plates to test this.
tbl <- unclass(with(e2chem, table(chnm, apid)))
nplates <- apply(tbl,1,function(x) sum(x > 0))
names(nplates)[which(nplates > 1)]
## [1] "1-Chloroanthraquinone"
## [2] "17alpha-Estradiol"
## [3] "17beta-Estradiol"
## [4] "2-Ethoxy-5-(1-propenyl)phenol"
## [5] "2,2'-Dibenzoylaminodiphenyl disulfide"
## [6] "3-Methoxyphenol"
## [7] "3-Methyl-4-(methylthio)phenol"
## [8] "3-Phenylphenol"
## [9] "3,3'-Dichlorobenzidine dihydrochloride"
## [10] "4-(Diethylamino)benzaldehyde"
## [11] "4-Iodophenol"
## [12] "4-Pentylphenol"
## [13] "4'-Acetylbiphenyl"
## [14] "6-Methoxy-2-benzothiazolamine"
## [15] "Ametryn"
## [16] "Azinphos-ethyl"
## [17] "Bisphenol A"
## [18] "C.I. Disperse Yellow 3"
## [19] "C.I. Solvent Yellow 56"
## [20] "CI-1044"
## [21] "CP-457677"
## [22] "Deisopropylatrazine"
## [23] "Diphenyl isophthalate"
## [24] "Estrone"
## [25] "Indan-5-ol"
## [26] "Methiocarb"
## [27] "N-Phenyl-2-naphthylamine"
## [28] "Phenothiazine"
So, 28 chemicals have at least 2, and 2 have 3 plates, so we can evaluate for them the extent to which the plate effects estimated for DMSO track variation in treatment effects, as long as duplicate plates share the same chemical concentrations.
Let’s plot plate-specific control and a treatment mean, say lowest conc. in one plot, highest conc. in another, against each other (x = DMSO value, y = treatment value, one point per chem and plate.) First, build a dataframe with columns chemname, plate, DMSO, conc1, concn. The rows are mean rvals for the chemical and plate. I need to do these steps:
keep <- names(nplates)[which(nplates > 1)]
## I really should get comfortable with tibbles
e2chemm <- e2chem[e2chem$chnm %in% keep,c("chnm","apid","conc","lrval")]
e2chemm$group <- paste(e2chemm$chnm, e2chemm$apid, e2chemm$conc, sep=":")
mns <- tapply(e2chemm$lrval, list(e2chemm$group), mean)
e2mns <- do.call(rbind,lapply(strsplit(names(mns), split=":"), function(x) {
data.frame(chnm = x[1], apid = x[2], conc = as.numeric(x[3]))
}))
e2mns$lrvalmn <- unname(mns)
dmsomns <- tapply(e2dmso$lrval, list(e2dmso$apid), mean)
For each chemical, make two figures. One a plot of DMSO control and all responses against concentration, color coding the plates (apid). The second, below it, plots of response minus plate-level control. In each figure, show the pooled variance among the observations. To simplify this, make function that takes a chemical name (chnm) and produces the plot.
plotem <- function(chnm){
pdta1 <- e2mns[e2mns$chnm == chnm,c("apid","conc","lrvalmn")]
dmso <- dmsomns[pdta1$apid]
dmsodta <- data.frame(apid = names(dmso),
conc = rep(0, length(dmso)),
lrvalmn=unname(dmso))
rmse <- numeric(3)
rmse[3] <- sd(unique(dmso))
lm1 <- lm(lrvalmn ~ 0 + factor(conc), data=pdta1)
rmse[1] <- summary(lm1)$sigma
p1 <- ggplot(data=pdta1, aes(x = conc, y = lrvalmn, group=apid, color=apid)) +
geom_point() + geom_line() +
geom_hline(data=dmsodta, mapping=aes(yintercept=lrvalmn,
group=apid, color=apid), lty=2) +
scale_x_log10(expression(paste("Administered Test Chemical Concentration (", mu, "M)"))) +
scale_y_continuous(expression(paste("Ln E2 Conc. (", mu, "M)"))) +
scale_color_manual("Date and Plate #", values=c("#66c2a5", "#fc8d62", "#8da0cb")) +
labs(subtitle=paste0("RMSE=",signif(rmse[1], digits=2))) +
theme_bw()
pdta1b <- pdta1
pdta1b$lrvalmn <- pdta1b$lrvalmn - dmso[pdta1b$apid]
lm2 = lm(lrvalmn ~ 0 + factor(conc), data=pdta1b)
rmse[2] <- summary(lm2)$sigma
p2 <- ggplot(pdta1b, aes(x = conc, y = lrvalmn, group = apid, color=apid)) +
geom_point() + geom_line() +
scale_x_log10(expression(paste("Administered Test Chemical Concentration (", mu, "M)"))) +
scale_y_continuous("Ln Fold-Change WRT Plate Ctrl") +
scale_color_manual("Date and Plate #", values=c("#66c2a5", "#fc8d62", "#8da0cb")) +
labs(subtitle=paste0("RMSE=",signif(rmse[2], digits=2))) +
theme_bw()
list(fig = arrangeGrob(p1,p2,top=chnm, nrow=2),
RMSE = rmse)
}
Now, apply this to each chemical with multiple plates. Each chemical has two graphs. The graph above is just the log-transformed values versus concentration (on a log scale). The plate-specific mean value is plotted as a dashed line, coded by color. Below that figure is a graph of the same data, but each point has had its plate’s control value subtracted. The rmse value for each figure is the pooled standard deviation of the data around the concentration-specific means (see code above). Designate the rmse for the unnormalized data as rmse0, and for the normalized data, rmse1.
chems <- sort(unique(e2mns$chnm))
RMSE <- data.frame(chnm = chems,
rmse0 = numeric(length(chems)),
rmse1 = numeric(length(chems)),
dmsosd = numeric(length(chems)),
row.names = 1)
i <- 0
for (Chem in rownames(RMSE)) {
i <- i + 1
out <- plotem(Chem)
RMSE[Chem, ] <- out[["RMSE"]]
ggsave(paste0("A_",i,".pdf"), plot=out[["fig"]],
width = 8, height= 5, units="in", dpi=600)
plot(out[["fig"]])
}
The rmse after adjusting for plate effects using concurrent controls was reduced for 22 chemicals, and increased for 6 chemicals.
If concurrent controls track overall plate effects, then rmse1 should generally be smaller than rmse0, and the difference should be related to the variance among the plate controls – if plate controls are very similar, then adjusting for plate effects should have little effect, and may even increase the rmse a little because the plate effect is estimated with some error. However, as plates become more dissimilar, the difference in rmse should become more pronounced, with the rmse based on adjusting for plate effects becoming smaller. Here is a plot of RMSEunnormalized - RMSEnormalized versus the standard deviation among plate DMSO means for the chemical.
pdta <- data.frame(x = RMSE[,3], y = RMSE[,1] - RMSE[,2],
chnm = rownames(RMSE))
anotdta <- pdta[pdta$y < 0.1,]
p <- ggplot(data=pdta, mapping=aes(x = x, y = y)) +
geom_point() +
geom_text_repel(data=anotdta, mapping=aes(x = x, y = y, label=chnm)) +
xlab("SD Among Plate-Specific Control Ln E2 Values") +
ylab("RMSE Difference") +
theme_bw()
ggsave("B_1.pdf", plot = p, width = 6, height= 4, units="in", dpi=600)
plot(p)
Note that negative differences between rmses (that is, negative values on the y-axis) mean that the rmse gets worse when data are normalized by plate DMSO means.
For three of the six chemicals where the rmse increased after adjusting for plate-specific controls, the plate-specific controls had quite similar values (these are the three left-most data points in the figure above). This is totally consistent with the controls tracking plate effects. For three chemicals, normalization substantially increases the rmse: 17\(\alpha\)-Estradiol, 17\(\beta\)-Estradiol, and Estrone. It is not at all clear what is going on with these three chemicals, though it may be significant that these are all potent estrogens, and the measured estradiol values in their treatment groups are substantially greater than for the other chemicals in this analysis.
In conclusion, Burgoon et al.’s criticism of Cardona and Rudel (and implicitly of the two Haggard et al. papers), based on criticizing the use of plate-specific controls to normalize data, is wrong. Very clearly, plate-specific control values track plate-to-plate variation in all the measurements on the plate, so normalizing the estradiol data by the plate-specific control values adjusts for plate-to-plate variability.
Another criticism of Burgoon et al is that for about half the chemicals identified as potential estrogens in Cardona et al, the corresponding DMSO means fall in the lower \(25^{th}\) percentile. However, ToxCast did not assign chemicals to assays at random (whatever that may mean), so if there were a temporal trend in plate DMSO means, that could account for this pattern. The date of an assay is coded into the apid labeling the plate, so it is possible to examine this, using the object dmsomns created above. In the figure below, each point is the mean DMSO control value (based on log values) for a single plate plotted against the date encoded in that plate’s apid in the dataset. The horizontal dashed line is the mean of all the plate DMSO means.
Formats <- c("02Apr14" = "%d%B%y",
"02Apr2014" = "%d%B%Y",
"03Sep2014" = "%d%B%Y",
"04112017" = "%m%d%Y",
"09Apr2014" = "%d%B%Y",
"10Jun2015" = "%d%B%Y",
"20130320" = "%Y%m%d",
"20130321" = "%Y%m%d",
"26Mar2014" = "%d%B%Y")
Dates0 <- sapply(strsplit(names(dmsomns), split="\\."), function(x) x[1])
Dates <- as.Date(Dates0, format=Formats[Dates0])
pdta <- data.frame(date = Dates,
control = dmsomns)
p <- ggplot(data=pdta, mapping=aes(x = date, y = control)) +
geom_point() + xlab("Assay Date") + ylab("Plate DMSO Mean") +
geom_hline(yintercept = mean(dmsomns), lty=2) +
theme_bw()
plot(p)
The DMSO means clearly cluster in time, making it less implausible that more potent estrogens might be assigned to dates with coincidentally lower plate levels. For estradiol there are seven blocks (dates), and plate DMSO means fall into essentially two groups: “low” and “high”.
How many and what fraction of chemicals were evaluated in “low” blocks?
lowplates <- unique(rownames(pdta)[pdta$control < mean(dmsomns)])
lowchems <- unique(e2chem$chnm[e2chem$apid %in% lowplates])
nlowchem <- length(lowchems)
nchems <- length(unique(e2chem$chnm))
Approximately 57.1428571% of chemicals were evaluated in at least one of the ‘low’ blocks.
Burgoon noticed that 48% of the chemicals identified by Cardona and Rudel were associated with plates whose DMSO means were in the lower 25th percentile. How many chemicals (and more importantly, what fraction of the chemicals) were assigned to plates at or below the 25th percentile?
cutoff <- quantile(dmsomns, pr = 0.25)
lowplates <- unique(rownames(pdta)[pdta$control < cutoff])
lowchems <- unique(e2chem$chnm[e2chem$apid %in% lowplates])
nlowchem <- length(lowchems)
nchems <- length(unique(e2chem$chnm))
About 35.4497354% of chemicals in this analysis were associated with plates whose DMSO means were in the lower 25th percentile. This differs from Burgoon’s analysis, I think.
As shown above, assay values for treatments track control levels pretty closely, so this is not a problem for the utility of the assay, AS LONG AS plates are normalized against their own controls, which Haggard et al. did.
read the data from the data file, and compare the list of plates in p4chem to those in the dmso dataset.
p4chem <- read.delim("/home/woodrow/Projects/SilentSpring/Cardona_and_Rudel_Comments/MyAnalysis/active_withsubs_p4up_2034_597.csv")
p4dmso <- read.delim("/home/woodrow/Projects/SilentSpring/Cardona_and_Rudel_Comments/MyAnalysis/DMSO_p4_2034_597.csv")
p4plates <- sort(unique(p4chem$apid))
p4dmsoplates <- sort(unique(p4dmso$apid))
if (!all(p4plates %in% p4dmsoplates)) sort(setdiff(p4plates, p4dmsoplates)) else "All There!"
## [1] "All There!"
So, here we create a new variable in each dataframe, lrval, the log-transform of rval
p4chem$lrval <- log(p4chem$rval)
p4dmso$lrval <- log(p4dmso$rval)
CRp4chem <- read.xlsx("/home/woodrow/Projects/SilentSpring/Cardona_and_Rudel_Comments/Supplemental Excel File.xlsx", sheet=5, start=3)
CRp4chem <- CRp4chem[CRp4chem[,10] %in% c("higher", "intermediate", "lower"),]
setdiff(CRp4chem[,4], paste0("'",p4chem$casn))
## [1] "'NOCAS_47374" "2832-40-8"
tbl <- unclass(with(p4chem, table(chnm, apid)))
nplates <- apply(tbl,1,function(x) sum(x > 0))
names(nplates)[which(nplates > 1)]
## [1] "(+/-)-cis-Permethrin"
## [2] "1,4-Bis(butylamino)anthracene-9,10-dione"
## [3] "1,6-Hexanediol diacrylate"
## [4] "2-(1-Chlorocyclopropyl)-1-(2-chlorophenyl)-3-(1H-1,2,4-triazol-1-yl)-2-propanol"
## [5] "2-Ethoxy-5-(1-propenyl)phenol"
## [6] "2,2'-Dibenzoylaminodiphenyl disulfide"
## [7] "3-Phenylphenol"
## [8] "3,4-Dinitrotoluene"
## [9] "4-Fluoro-3-nitroaniline"
## [10] "4-Iodophenol"
## [11] "6-Methoxy-2-benzothiazolamine"
## [12] "Ametryn"
## [13] "Androsterone"
## [14] "Azinphos-ethyl"
## [15] "C.I. Disperse Yellow 3"
## [16] "CI-1044"
## [17] "Dexamethasone sodium phosphate"
## [18] "Disulfoton sulfone"
## [19] "Isofenphos"
## [20] "Mancozeb"
## [21] "N-Phenyl-2-naphthylamine"
## [22] "Prothioconazole"
## [23] "Tri-o-cresyl phosphate"
## [24] "Vatalanib dihydrochloride"
So, 24 chemicals have 2 plates, so we can evaluate for them the extent to which the plate effects estimated for DMSO track variation in treatment effects, as long as duplicate plates share the same chemical concentrations.
Let’s plot plate-specific control and a treatment mean, say lowest conc. in one plot, highest conc. in another, against each other (x = DMSO value, y = treatment value, one point per chem and plate.) First, build a dataframe with columns chemname, plate, DMSO, conc1, concn. The rows are mean rvals for the chemical and plate. I need to do these steps:
keep <- names(nplates)[which(nplates > 1)]
## I really should get comfortable with tibbles
p4chemm <- p4chem[p4chem$chnm %in% keep,c("chnm","apid","conc","lrval")]
p4chemm$group <- paste(p4chemm$chnm, p4chemm$apid, p4chemm$conc, sep=":")
mns <- tapply(p4chemm$lrval, list(p4chemm$group), mean)
p4mns <- do.call(rbind,lapply(strsplit(names(mns), split=":"), function(x) {
data.frame(chnm = x[1], apid = x[2], conc = as.numeric(x[3]))
}))
p4mns$lrvalmn <- unname(mns)
p4dmsomns <- tapply(p4dmso$lrval, list(p4dmso$apid), mean)
For each chemical, make two figures. One a plot of DMSO control and all responses against concentration, color coding the plates (apid). The second, below it, plots of response minus plate-level control. In each figure, show the pooled variance among the observations. To simplify this, make function that takes a chemical name (chnm) and produces the plot.
plotem <- function(chnm){
pdta1 <- p4mns[p4mns$chnm == chnm,c("apid","conc","lrvalmn")]
dmso <- p4dmsomns[pdta1$apid]
dmsodta <- data.frame(apid = names(dmso),
conc = rep(0, length(dmso)),
lrvalmn=unname(dmso))
rmse <- numeric(3)
rmse[3] <- sd(unique(dmso))
lm1 <- lm(lrvalmn ~ 0 + factor(conc), data=pdta1)
rmse[1] <- summary(lm1)$sigma
p1 <- ggplot(data=pdta1, aes(x = conc, y = lrvalmn, group=apid, color=apid)) +
geom_point() + geom_line() +
geom_hline(data=dmsodta, mapping=aes(yintercept=lrvalmn,
group=apid, color=apid), lty=2) +
scale_x_log10(expression(paste("Administered Test Chemical Concentration (", mu, "M)"))) +
scale_y_continuous(expression(paste("Ln P4 Conc. (", mu, "M)"))) +
scale_color_manual("Date and Plate #", values=c("#66c2a5", "#fc8d62", "#8da0cb")) +
labs(subtitle=paste0("RMSE=",signif(rmse[1], digits=2))) +
theme_bw()
pdta1b <- pdta1
pdta1b$lrvalmn <- pdta1b$lrvalmn - dmso[pdta1b$apid]
lm2 = lm(lrvalmn ~ 0 + factor(conc), data=pdta1b)
rmse[2] <- summary(lm2)$sigma
p2 <- ggplot(pdta1b, aes(x = conc, y = lrvalmn, group = apid, color=apid)) +
geom_point() + geom_line() +
scale_x_log10(expression(paste("Administered Test Chemical Concentration (", mu, "M)"))) +
scale_y_continuous("Ln Fold-Change WRT Plate Ctrl") +
scale_color_manual("Date and Plate #", values=c("#66c2a5", "#fc8d62", "#8da0cb")) +
labs(subtitle=paste0("RMSE=",signif(rmse[2], digits=2))) +
theme_bw()
list(fig = arrangeGrob(p1,p2,top=chnm, nrow=2),
RMSE = rmse)
}
Now, apply this to each chemical with multiple plates. Each chemical has two graphs. The graph above is just the log-transformed values versus concentration (on a log scale). The plate-specific mean value is plotted as a dashed line, coded by color. Below that figure is a graph of the same data, but each point has had its plate’s control value subtracted. The rmse value for each figure is the pooled standard deviation of the data around the concentration-specific means (see code above). Designate the rmse for the unnormalized data as rmse0, and for the normalized data, rmse1.
chems <- sort(unique(p4mns$chnm))
RMSE <- data.frame(chnm = chems,
rmse0 = numeric(length(chems)),
rmse1 = numeric(length(chems)),
dmsosd = numeric(length(chems)),
row.names = 1)
i <- 0
for (Chem in rownames(RMSE)) {
i <- i + 1
out <- plotem(Chem)
RMSE[Chem, ] <- out[["RMSE"]]
ggsave(paste0("C_",i,".pdf"), plot=out[["fig"]],
width = 8, height= 5, units="in", dpi=600)
plot(out[["fig"]])
}
The rmse after adjusting for plate effects using concurrent controls was reduced for 22 chemicals, and increased for 2 chemicals.
If concurrent controls track overall plate effects, then rmse1 should generally be smaller than rmse0, and the difference should be related to the variance among the plate controls – if plate controls are very similar, then adjusting for plate effects should have little effect, and may even increase the rmse a little because the plate effect is estimated with some error. However, as plates become more dissimilar, the difference in rmse should become more pronounced, with the rmse based on adjusting for plate effects becoming smaller. Here is a plot of RMSEunnormalized - RMSEnormalized versus the standard deviation among plate DMSO means for the chemical.
pdta <- data.frame(x = RMSE[,3], y = RMSE[,1] - RMSE[,2],
chnm = rownames(RMSE))
anotdta <- pdta[pdta$y < 0.2,]
p <- ggplot(data=pdta, mapping=aes(x = x, y = y)) +
geom_point() +
geom_text_repel(data=anotdta, mapping=aes(x = x, y = y, label=chnm)) +
xlab("SD Among Plate-Specific Control Ln E2 Values") +
ylab("RMSE Difference") +
theme_bw()
ggsave("D_1.pdf", plot = p, width = 6, height= 4, units="in", dpi=600)
plot(p)
Note that negative differences between rmses (that is, negative values on the y-axis) mean that the rmse gets worse when data are normalized by plate DMSO means.
Formats <- c("02Apr14" = "%d%B%y",
"02Apr2014" = "%d%B%Y",
"03Sep2014" = "%d%B%Y",
"04112017" = "%d%m%Y",
"09Apr2014" = "%d%B%Y",
"10Jun2015" = "%d%B%Y",
"20130320" = "%Y%m%d",
"20130321" = "%Y%m%d",
"26Mar2014" = "%d%B%Y")
Dates0 <- sapply(strsplit(names(p4dmsomns), split="\\."), function(x) x[1])
Dates <- as.Date(Dates0, format=Formats[Dates0])
pdta <- data.frame(date = Dates,
control = p4dmsomns)
p <- ggplot(data=pdta, mapping=aes(x = date, y = control)) +
geom_point() + xlab("Assay Date") + ylab("Plate DMSO Mean") +
geom_hline(yintercept = mean(dmsomns), lty=2) +
theme_bw()
plot(p)